Skip to content

Cache Conda env #47454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 27, 2022
Merged

Cache Conda env #47454

merged 6 commits into from
Jun 27, 2022

Conversation

jonashaag
Copy link
Contributor

  • closes #xxxx (Replace xxxx with the Github issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@jonashaag jonashaag marked this pull request as ready for review June 22, 2022 20:43
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can cache-downloads: true be set?
  2. Could you give an example before/after improvement in the setup?

@mroeschke mroeschke added the CI Continuous Integration label Jun 23, 2022
@jonashaag
Copy link
Contributor Author

  1. Can cache-downloads: true be set?

Yes we can do it. I'm not sure it's worth it given that the total amount of data that can be cached per repository is 10 GB (https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy) and some of the envs are 0.5 GB large.

@jonashaag
Copy link
Contributor Author

2. Could you give an example before/after improvement in the setup?

Sure, here we have 2 min -> 0.5 min https://github.com/pandas-dev/pandas/runs/7000499676?check_suite_focus=true https://github.com/pandas-dev/pandas/runs/7007202415?check_suite_focus=true and here we have 4.5 min to 1.5 min https://github.com/pandas-dev/pandas/runs/6998916048?check_suite_focus=true https://github.com/pandas-dev/pandas/runs/7001171440?check_suite_focus=true

In my experience env setup can be considerably slower if there are network problems. For example here we have 8 min, although I'm not sure how much of it is related to using Mamba instead of Micromama https://github.com/pandas-dev/pandas/runs/7028218282?check_suite_focus=true.

@mroeschke
Copy link
Member

  1. Can cache-downloads: true be set?

Yes we can do it. I'm not sure it's worth it given that the total amount of data that can be cached per repository is 10 GB (https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy) and some of the envs are 0.5 GB large.

I see. I would still be partial in trying it as I've noticed we sometimes still hit connection errors from conda although I extended the timeouts in the condarc.yml. If environments are evicted from the cache during every build due to the size then yeah I can see it not being worth it

@mroeschke mroeschke added this to the 1.5 milestone Jun 24, 2022
@mroeschke mroeschke merged commit f81ac72 into pandas-dev:main Jun 27, 2022
@mroeschke
Copy link
Member

Thanks @jonashaag. These CI improvements are awesome!

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
* Cache Conda env

* python-version -> extra-specs

* python-version -> extra-specs

* Remove old broken conda caching

* Add cache-downloads: true

* Undo debug change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants